Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

نویسندگان

  • Lingqiao Liu
  • Chunhua Shen
  • Lei Wang
  • Anton van den Hengel
  • Chao Wang
چکیده

Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-ofthe-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Local Feature Encoding for Human Action Recognition with Approximate Sparse Coding

Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations that can be used in learning and recognition. In this paper, we present an efficient local feature encoding approach, which is called Approximate Sparse Coding (ASC). ASC computes the...

متن کامل

Deep Tensor Encodings

Learning an encoding of feature vectors in terms of an over-complete dictionary or a information geometric (Fisher vectors) construct is wide-spread in statistical signal processing and computer vision. In content based information retrieval using deep-learning classi€ers, such encodings are learnt on the ƒaŠened last layer, without adherence to the multi-linear structure of the underlying feat...

متن کامل

Local Patch Vectors Encoded by Fisher Vectors for Image Classification

The objective of this work is image classification, whose purpose is to group images into corresponding semantic categories. Four contributions are made as follows: (i) For computational simplicity and efficiency, we directly adopt raw image patch vectors as local descriptors encoded by Fisher vector (FV) subsequently; (ii) For obtaining representative local features within the FV encoding fram...

متن کامل

Image classification using spatial pyramid robust sparse coding

Recently, the sparse coding based codebook learning and local feature encoding have been widely used for image classification. The sparse coding model actually assumes the reconstruction error follows Gaussian or Laplacian distribution, which may not be accurate enough. Besides, the ignorance of spatial information during local feature encoding process also hinders the final image classificatio...

متن کامل

Hierarchical Coding Vectors for Scene Level Land-Use Classification

Land-use classification from remote sensing images has become an important but challenging task. This paper proposes Hierarchical Coding Vectors (HCV), a novel representation based on hierarchically coding structures, for scene level land-use classification. We stack multiple Bag of Visual Words (BOVW) coding layers and one Fisher coding layer to develop the hierarchical feature learning struct...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014